2023-03-13

R crash course

Important links

The Team

Getting to know you

Aims

  • R for data science (Hadley Wickham)
  • Give you a jump start to
    • exchange data between R and spreadsheet programs such as Excel
    • manipulate and display the data
    • run simple statistical analysis (t-test, non-parametric tests, basic ANOVA, basic linear regression)
    • create reports from your analysis in a number of formats
  • Give you good habits
  • Proceed along a helix

Course files

Aims for today

  • Introducing RStudio as environment for developing your own programs
  • Concepts on good coding practice
  • Language basics of R (variables, operators, functions …)

How to start

Creating a new project

  • In Rstudio, go to File -> New Project
  • When the dialog window appears, select New Directory -> New Project

A project is not a file – but a directory which contains all relevant files (scripts, documentation, Excel files with the source data, exported documents…). You can put anything in this directory.

It is a very good idea to save all data and meta-data related to your project in this directory.

Example R session

When things don’t work…

Error 1

Error 2

  • Don’t panic!

  • Do not ignore error messages. Very often, they tell you exactly what the problem is.

  • Google it!

The rdesktop workplace

  • Windows
  • Menus
  • Workspace

Workspaces

Workspace is basically a folder which contains a few special files in which R stores project-specific data.

  • Rhistory (hidden file) – a text file containing all commands that you have issued
  • Rdata (hidden file) – a binary file containing your workspace (all variables created)
  • <filename>.Rproj – Rstudio R project file containing some rstudio-specific settings (text file)
  • Anything else should be save by you

Examples of R applications

  • Why programming?
  • Why R?
  • Alternatives: Python, matlab, other statistical languages

A few notes on R

  • R vs matlab
  • “There is more than one way of doing it” (but one way will usually be optimal)
  • Tidyverse vs standard R (demo)
  • ggplot vs basic plots (demo)

R language basics

R language basics (demo)

## Anything starting with a '#' is a comment
## Assignment and creating variables 

a <- 2
name <- "Manuela"

## vectors and multiple assignment 

a <- c(1, 7, 9)

## operators 

a <- 3 + 5 
b <- a * 7

## functions 

sum(c(1, 2, 3))
i <- length(a)

Try it out!

Exercises 1.1 Creating variables

create variables: a string, a number, a factor

  • create a string variable called “teacher” containing the word “manuela”
  • create a string variable called “teachers” containing two values: “manuela” and “january”
  • what does 1:5 do?
  • create a variable containing numbers from 1 to 10

Note: play with the computer, try out things

Exercises 1.2 Vectorization

  • what happens when you add a number to a vector? (i.e. c(3, 1, 4) + 5)
  • what happens when you multiply a vector with a number?
  • add vector c(1, 2, 5) and vector c(10, 20, 30). What happens?
  • type length(teacher); type length(teachers); what are the results?
  • type pi. What happens?
  • say, you have thre values which are the diameters of three circles: 1, 5 and 13. You would like to have a vector containing the areas of these circles. What is the simplest way of doing that?

Exercises 1.2(b) Extra: recycling

(Extra exercise: only if you are bored!)

vec1 <- 1:10
vec2 <- 20:30

vec1 + vec2

This produces a warning message. Should you be worried? What happened?

What does the length() function do?

Comment: vectorization

In R, a lot of things are vectors and this is very convenient.

If you assign a value to a variable: counter <- 5 you are in fact creating a vector of length 1.

Exercise 1.3: Selecting elements from vectors

  • Type teachers[1]. Now type teachers[2]. What happens?

  • Now type the following. What exactly is happening here?

students <- c("Anna", "Bernie", "Claudia", "David")
sel <- c(1, 2)

students[sel]
students[sel] <- c("Arthur", "Beate")
  • What would students[1:3] do and why?

  • Can you do students[c(3, 1)]? What happens?

  • What happens when you do students[-1]? What does that - sign seem to do?

Factors

Some variables are factors. Factors look almost like strings, but they have a slightly different behavior. Sometimes factor behave like numbers.

In statistics, factors play an important role.

You don’t have to learn about factors yet, but you should know that they exist.

f_teachers <- factor(teachers)
as.numeric(f_teachers)
as.character(f_teachers)

Good coding practices

Remember: language is communication

  • Your code will be seen by others
  • And this is a good thing!
  • Documentation is important
  • Reproducibility matters

Documenting your code

  • Better a lousy documentation than none at all
  • Use spaces, empty lines, comments to structure your code
  • COMMENT, COMMENT, COMMENT
  • Document in plain text files and source code files

Writing code

Examples

Bad:

a <- 5

b <- c(1,10, 
20, 21, 5)

r<-sqrt((b-mean(b))^2/a)

Better:

# example values for five samples
samples   <- c(1, 10, 20, 21, 5)
samples_n <- length(samples)

samples_mean <- mean(samples)
samples_devs <- samples - samples_mean

# samples variance
samples_var  <- samples_devs^2 / (samples_n - 1)

# calculate standard deviation of samples
samples_sd   <- sqrt(samples_var)

# use the built-in function
samples_sd <- sd(samples)

Introducing functions (look ahead for day 2)

  • Everything in R is a function. It takes a certain number of parameters and returns exactly one value (which may be a vector or something else)
  • Some special functions are operators, like + or -
  • Values can be assigned to variables (e.g. a <- 2).
  • There are different types of values, including vectors. There are numeric, character, logical and factor vectors.

Demo